We consider task allocation for multi-object transport using a multi-robot system, in which each robot selects one object among multiple objects with different and unknown weights. The existing centralized methods assume the number of robots and tasks to be fixed, which is inapplicable to scenarios that differ from the learning environment. Meanwhile, the existing distributed methods limit the minimum number of robots and tasks to a constant value, making them applicable to various numbers of robots and tasks. However, they cannot transport an object whose weight exceeds the load capacity of robots observing the object. To make it applicable to various numbers of robots and objects with different and unknown weights, we propose a framework using multi-agent reinforcement learning for task allocation. First, we introduce a structured policy model consisting of 1) predesigned dynamic task priorities with global communication and 2) a neural network-based distributed policy model that determines the timing for coordination. The distributed policy builds consensus on the high-priority object under local observations and selects cooperative or independent actions. Then, the policy is optimized by multi-agent reinforcement learning through trial and error. This structured policy of local learning and global communication makes our framework applicable to various numbers of robots and objects with different and unknown weights, as demonstrated by numerical simulations.
translated by 谷歌翻译
In this paper, we present a solution to a design problem of control strategies for multi-agent cooperative transport. Although existing learning-based methods assume that the number of agents is the same as that in the training environment, the number might differ in reality considering that the robots' batteries may completely discharge, or additional robots may be introduced to reduce the time required to complete a task. Therefore, it is crucial that the learned strategy be applicable to scenarios wherein the number of agents differs from that in the training environment. In this paper, we propose a novel multi-agent reinforcement learning framework of event-triggered communication and consensus-based control for distributed cooperative transport. The proposed policy model estimates the resultant force and torque in a consensus manner using the estimates of the resultant force and torque with the neighborhood agents. Moreover, it computes the control and communication inputs to determine when to communicate with the neighboring agents under local observations and estimates of the resultant force and torque. Therefore, the proposed framework can balance the control performance and communication savings in scenarios wherein the number of agents differs from that in the training environment. We confirm the effectiveness of our approach by using a maximum of eight and six robots in the simulations and experiments, respectively.
translated by 谷歌翻译
Diagnostic radiologists need artificial intelligence (AI) for medical imaging, but access to medical images required for training in AI has become increasingly restrictive. To release and use medical images, we need an algorithm that can simultaneously protect privacy and preserve pathologies in medical images. To develop such an algorithm, here, we propose DP-GLOW, a hybrid of a local differential privacy (LDP) algorithm and one of the flow-based deep generative models (GLOW). By applying a GLOW model, we disentangle the pixelwise correlation of images, which makes it difficult to protect privacy with straightforward LDP algorithms for images. Specifically, we map images onto the latent vector of the GLOW model, each element of which follows an independent normal distribution, and we apply the Laplace mechanism to the latent vector. Moreover, we applied DP-GLOW to chest X-ray images to generate LDP images while preserving pathologies.
translated by 谷歌翻译
Deep learning (DL) has become a driving force and has been widely adopted in many domains and applications with competitive performance. In practice, to solve the nontrivial and complicated tasks in real-world applications, DL is often not used standalone, but instead contributes as a piece of gadget of a larger complex AI system. Although there comes a fast increasing trend to study the quality issues of deep neural networks (DNNs) at the model level, few studies have been performed to investigate the quality of DNNs at both the unit level and the potential impacts on the system level. More importantly, it also lacks systematic investigation on how to perform the risk assessment for AI systems from unit level to system level. To bridge this gap, this paper initiates an early exploratory study of AI system risk assessment from both the data distribution and uncertainty angles to address these issues. We propose a general framework with an exploratory study for analyzing AI systems. After large-scale (700+ experimental configurations and 5000+ GPU hours) experiments and in-depth investigations, we reached a few key interesting findings that highlight the practical need and opportunities for more in-depth investigations into AI systems.
translated by 谷歌翻译
When beginners learn to speak a non-native language, it is difficult for them to judge for themselves whether they are speaking well. Therefore, computer-assisted pronunciation training systems are used to detect learner mispronunciations. These systems typically compare the user's speech with that of a specific native speaker as a model in units of rhythm, phonemes, or words and calculate the differences. However, they require extensive speech data with detailed annotations or can only compare with one specific native speaker. To overcome these problems, we propose a new language learning support system that calculates speech scores and detects mispronunciations by beginners based on a small amount of unannotated speech data without comparison to a specific person. The proposed system uses deep learning--based speech processing to display the pronunciation score of the learner's speech and the difference/distance between the learner's and a group of models' pronunciation in an intuitively visual manner. Learners can gradually improve their pronunciation by eliminating differences and shortening the distance from the model until they become sufficiently proficient. Furthermore, since the pronunciation score and difference/distance are not calculated compared to specific sentences of a particular model, users are free to study the sentences they wish to study. We also built an application to help non-native speakers learn English and confirmed that it can improve users' speech intelligibility.
translated by 谷歌翻译
图形数据库(GDB)启用对非结构化,复杂,丰富且通常庞大的图形数据集的处理和分析。尽管GDB在学术界和行业中都具有很大的意义,但几乎没有努力将它们与图形神经网络(GNNS)的预测能力融为一体。在这项工作中,我们展示了如何无缝将几乎所有GNN模型与GDB的计算功能相结合。为此,我们观察到这些系统大多数是基于或支持的,称为标记的属性图(LPG)的图形数据模型,在该模型中,顶点和边缘可以任意复杂的标签和属性集。然后,我们开发LPG2VEC,这是一种编码器,将任意LPG数据集转换为可以与广泛的GNN类直接使用的表示形式,包括卷积,注意力,消息通话,甚至高阶或频谱模型。在我们的评估中,我们表明,LPG2VEC可以正确保留代表LPG标签和属性的丰富信息,并且与与图形相比,与与图形相比,它提高了预测的准确性,而不管有针对性的学习任务或使用过的GNN模型,多达34%没有LPG标签/属性。通常,LPG2VEC可以将最强大的GNN的预测能力与LPG模型中编码的全部信息范围相结合,为神经图数据库铺平了道路,这是一类系统,其中维护的数据的绝大复杂性将从现代和未来中受益图机学习方法。
translated by 谷歌翻译
成倍增长的模型大小驱动了深度学习的持续成功,但它带来了过度的计算和记忆成本。从算法的角度来看,已经研究了模型的稀疏和量化以减轻问题。从体系结构的角度来看,硬件供应商提供了张量核心以进行加速。但是,由于严格的数据布局要求以及缺乏有效操纵低精度整数的支持,因此从稀疏的低精度矩阵操作中获得实践加速非常具有挑战性。我们提出了Magicube,这是一个高性能的稀疏矩阵库,用于张量芯上的低精度整数。 Magicube支持SPMM和SDDMM,这是深度学习的两个主要稀疏操作。 NVIDIA A100 GPU的实验结果表明,Magicube平均在供应商优化的库中平均达到1.44倍(高达2.37倍)的速度,用于稀疏内核,而在最先进的艺术品上进行了1.43倍的速度,具有可比的准确性。端到端稀疏变压器推断。
translated by 谷歌翻译
数据增强是使用深度学习来提高对象识别的识别精度的重要技术。从多个数据集中产生混合数据(例如混音)的方法可以获取未包含在培训数据中的新多样性,从而有助于改善准确性。但是,由于在整个训练过程中选择了选择用于混合的数据,因此在某些情况下未选择适当的类或数据。在这项研究中,我们提出了一种数据增强方法,该方法根据班级概率来计算类之间的距离,并可以从合适的类中选择数据以在培训过程中混合。根据每个班级的训练趋势,对混合数据进行动态调整,以促进培​​训。所提出的方法与常规方法结合使用,以生成混合数据。评估实验表明,提出的方法改善了对一般和长尾图像识别数据集的识别性能。
translated by 谷歌翻译
自我监督学习中的最新作品通过以对象为中心或基于区域的对应目标进行预处理,在场景级密集的预测任务上表现出了强劲的表现。在本文中,我们介绍了区域对象表示学习(R2O),该学习统一了基于区域的和以对象为中心的预处理。 R2O通过训练编码器以动态完善基于区域的段为中心的蒙版,然后共同学习掩模中内容的表示形式。 R2O使用“区域改进模块”将使用区域级先验生成的小图像区域分组为较大的区域,这些区域倾向于通过聚类区域级特征对应对应对象。随着训练的进展,R2O遵循了一个区域到对象的课程,该课程鼓励学习区域级的早期特征并逐渐进步以训练以对象为中心的表示。使用R2O的表示形式导致了Pascal VOC(+0.7 MIOU)和CityScapes(+0.4 MIOU)的语义细分表现最先进的表现,并在MS Coco(+0.3 Mask AP)上进行了实例细分。此外,在对Imagenet进行了预审进之后,R2O预处理的模型能够超过Caltech-UCSD Birds 200-2011数据集(+2.9 MIOU)的无监督物体细分中现有的最新对象细分。我们在https://github.com/kkallidromitis/r2o上提供了这项工作的代码/模型。
translated by 谷歌翻译
从出生到死亡,由于老化,我们都经历了令人惊讶的无处不在的变化。如果我们可以预测数字领域的衰老,即人体的数字双胞胎,我们将能够在很早的阶段检测病变,从而提高生活质量并延长寿命。我们观察到,没有一个先前开发的成年人体数字双胞胎在具有深层生成模型的体积医学图像之间明确训练的纵向转换规则,可能导致例如心室体积的预测性能不佳。在这里,我们建立了一个新的成人人体的数字双胞胎,该数字双胞胎采用纵向获得的头部计算机断层扫描(CT)图像进行训练,从而从一个当前的体积头CT图像中预测了未来的体积头CT图像。我们首次采用了三维基于流动的深层生成模型之一,以实现这种顺序的三维数字双胞胎。我们表明,我们的数字双胞胎在相对较短的程度上优于预测心室体积的最新方法。
translated by 谷歌翻译